empirical distribution
00482b9bed15a272730fcb590ffebddd-Supplemental.pdf
A.1 Training dataset pre-processing We used 40000publicly available videos from YouTube which were available in a spatial resolution of at least 1920 1080 pixels. In an attempt not to skew the distribution of content too far from what may inform biological representation learning, we excluded most artificial content such as screenshots and videos of computer games. To reduce video compression artifacts and prevent systematic downsampling artifacts, each segment was then spatially downsampled to a randomized height between 128 and 160. Each segment was then separated into 15 pairs of neighboring frames, and a randomly placed, but spatially colocated patch of 64 64 pixels was cropped out of each frame pair. The order of the frame pairs was then randomized in a running buffer, and all RGB pixel values were normalized to the range between 0 and 1 before being fed into the model.
Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification
We consider the fixed-budget best arm identification problem where the goal is to find the arm of the largest mean with a fixed number of samples. It is known that the probability of misidentifying the best arm is exponentially small to the number of rounds. However, limited characterizations have been discussed on the rate (exponent) of this value. In this paper, we characterize the minimax optimal rate as a result of an optimization over all possible parameters. We introduce two rates, Rgo and Rgo, corresponding to lower bounds on the probability of misidentification, each of which is associated with a proposed algorithm. The rate Rgo is associated with Rgo-tracking, which can be efficiently implemented by a neural network and is shown to outperform existing algorithms. However, this rate requires a nontrivial condition to be achievable. To address this issue, we introduce the second rate Rgo . We show that this rate is indeed achievable by introducing a conceptual algorithm called delayed optimal tracking (DOT).
Distributionally Robust K-Means Clustering
Malik, Vikrant, Kargin, Taylan, Hassibi, Babak
In recent years, the widespreadavailability of large-scale, high-dimensionaldatasets has driven significant interest in clustering algorithms that are both computationally efficient and robust to distributional shifts and outliers. The classical clustering method, K-means, can be seen as an application of the Lloyd-Max quantization algorithm, in which the distribution being quantized is the empirical distribution of the points to be clustered. This empirical distribution generally differs from the true underlying distribution, especially when the number of points to be clustered is small. This induces a distributional shift, which can also arise in many real-world settings, such as image segmentation, biological data analysis, and sensor networks, due to noise variations, sensor inaccuracies, or environmental changes. Distributional shifts can severely impact the performance of clustering algorithms, leading to degraded cluster assignments and unreliable downstream analysis. The field of clustering has a rich history. One of the most popular algorithms in this field is theK-means (KM) algorithm, introduced by [1], which computes centroids by iteratively updating the conditional mean of the data in the Voronoi regions induced by the centroids. However, standardK-means is sensitive to initialization and, in general, converges only to a local minimum.
Learning interacting particle systems from unlabeled data
Learning the potentials of interacting particle systems is a fundamental task across various scientific disciplines. A major challenge is that unlabeled data collected at discrete time points lack trajectory information due to limitations in data collection methods or privacy constraints. We address this challenge by introducing a trajectory-free self-test loss function that leverages the weak-form stochastic evolution equation of the empirical distribution. The loss function is quadratic in potentials, supporting parametric and nonparametric regression algorithms for robust estimation that scale to large, high-dimensional systems with big data. Systematic numerical tests show that our method outperforms baseline methods that regress on trajectories recovered via label matching, tolerating large observation time steps. We establish the convergence of parametric estimators as the sample size increases, providing a theoretical foundation for the proposed approach.
Label Noise Cleaning for Supervised Classification via Bernoulli Random Sampling
Liu, Yuxin, Jin, Xiong, Han, Yang
Label noise - incorrect labels assigned to observations - can substantially degrade the performance of supervised classifiers. This paper proposes a label noise cleaning method based on Bernoulli random sampling. We show that the mean label noise levels of subsets generated by Bernoulli random sampling containing a given observation are identically distributed for all clean observations, and identically distributed, with a different distribution, for all noisy observations. Although the mean label noise levels are not independent across observations, by introducing an independent coupling we further prove that they converge to a mixture of two well-separated distributions corresponding to clean and noisy observations. By establishing a linear model between cross-validated classification errors and label noise levels, we are able to approximate this mixture distribution and thereby separate clean and noisy observations without any prior label information. The proposed method is classifier-agnostic, theoretically justified, and demonstrates strong performance on both simulated and real datasets.
Instance-Optimal Private Density Estimation in the Wasserstein Distance
Estimating the density of a distribution from samples is a fundamental problem in statistics. In many practical settings, the Wasserstein distance is an appropriate error metric for density estimation. For example, when estimating population densities in a geographic region, a small Wasserstein distance means that the estimate is able to capture roughly where the population mass is. In this work we study differentially private density estimation in the Wasserstein distance. We design and analyze instance-optimal algorithms for this problem that can adapt to easy instances.